A Web-Platform for Preserving, Exploring, Visualising, and Querying Linguistic Corpora and other Resources

نویسندگان

Georg Rehm

Oliver Schonefeld

Andreas Witt

Christian Chiarcos

Timm Lehmberg

چکیده

We present SPLICR, the Web-based Sustainability Platform for Linguistic Corpora and Resources. The system is aimed at people who work in Linguistics or Computational Linguistics: a comprehensive database of metadata records can be explored in order to find language resources that could be appropriate for one’s specific research needs. SPLICR also provides a graphical interface that enables users to query and to visualise corpora. The project in which the system is developed aims at sustainably archiving the ca. 60 language resources that have been constructed in three collaborative research centres. Our project has two primary goals: (a) To process and to archive sustainably the resources so that they are still available to the research community in five, ten, or even 20 years time. (b) To enable researchers to query the resources both on the level of their metadata as well as on the level of linguistic annotations. In more general terms, our goal is to enable solutions that leverage the interoperability, reusability, and sustainability of heterogeneous collections of language resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ontology-Based XQuery'ing of XML-Encoded Language Resources on Multiple Annotation Layers

We present an approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages. An OWL ontology provides a homogenising view on the conceptually different markup languages so that a common querying framework can be established using the method of ontology-based query expansion. In addition, we present a highly...

متن کامل

Tools and Resources for Visualising Conversational-Speech Interaction

This paper describes tools and techniques for accessing large quantities of speech data and for the visualisation of discourse interactions and events at levels above that of linguistic content. We are working with large quantities of dialogue speech including business meetings, friendly discourse, and telephone conversations, and have produced web-based tools for the visualisation of non-verba...

متن کامل

Searchable Metaspaces 1 Overview of Objectives

The purpose of this presentation is to start a discussion about methodological and operational requirements for developing tools for internet browsing and/or querying of meta-descriptions of language resources, in particular multimodal corpora. Among the most important requirements are: delimiting the relationship both between meta-descriptions and the resources they apply to, and between brows...

متن کامل

A new Ontology Lookup Service at EMBL-EBI

The use of bio-‐medical ontologies for the annotation, integration and analysis of biological data is now well established in bioinformatics. The range and diversity of ontologies has increased dramatically over the last ten years and community efforts such as the OBO foundry have been instrumental in coordinating this activity. The demand for unified mechanisms for accessing large collections...

متن کامل

ZT Corpus: Annotation and Tools for Basque Corpora

The ZT Corpus (Basque Corpus of Science and Technology) is a tagged collection of specialised texts in Basque, which aims to be a major resource in research and development with respect to written technical Basque: terminology, syntax and style. It was released in December 2006 and can be queried at http://www.ztcorpusa.net. The ZT Corpus stands out among other Basque corpora for many reasons: ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Procesamiento del Lenguaje Natural

دوره 41 شماره

صفحات -

تاریخ انتشار 2008

A Web-Platform for Preserving, Exploring, Visualising, and Querying Linguistic Corpora and other Resources

نویسندگان

چکیده

منابع مشابه

Ontology-Based XQuery'ing of XML-Encoded Language Resources on Multiple Annotation Layers

Tools and Resources for Visualising Conversational-Speech Interaction

Searchable Metaspaces 1 Overview of Objectives

A new Ontology Lookup Service at EMBL-EBI

ZT Corpus: Annotation and Tools for Basque Corpora

عنوان ژورنال:

اشتراک گذاری